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1.  Introduction 


A  quantum  mechanical  (QM)  approach  to  materials  science  provides  a  gold  standard  atomistic 
picture  of  the  mechanisms  responsible  for  a  range  of  phenomena  seen  in  macroscopic  and 
experimental  situations.  The  need  to  understand  materials  science  problems  from  atomistic  to 
macroscale  was  the  impetus  for  the  US  Army  Research  Laboratory  to  initiate  the  Enterprise  for 
Multiscale  Material  Research.  This  long-term  project  attempts  to  redefine  how  materials  science 
questions  are  posed  and  solved.  To  support  this  goal,  current  state-of-the-art  QM  capabilities 
need  to  be  extended  in  the  number  of  atoms  that  can  be  treated  and  the  length  scale  of  the 
dynamics  that  can  be  simulated.  This  extension  is  referred  to  as  large-scale  QM — large  both 
spatially  and  temporally.  This  will  enable  fundamental  advances  in  the  understanding  of 
materials  science  problems. 

This  is  the  first  in  a  series  of  reports  that  will  be  required  to  realize  the  new  capabilities  needed  to 
model  realistic  materials  science  problems.  There  are  several  areas  of  QM  theory  that  may  be 
able  to  address  these  material  properties,  which  present  unique  strengths.  We  hope  to  exploit 
these  advantages  to  gain  new  insight  into  the  basic  physical  processes  governing  macroscopic 
material  properties  from  atomistic  simulations.  The  first  focus  will  be  on  expanding  the  linear 
scaling  methods  currently  implemented  within  the  atomistic  simulation  package  CP2K.^  Future 
work  may  expand  on  this  work  in  linear  scaling  or  adding  functionality  to  the  basic  linear  scaling 
approach.  Other  methods  to  achieve  linear  scaling  involve  orbital  transform  free  methods, 
fragmentation  methods,  and  new  molecular  dynamics  ensembles.  These  may  also  be  investigated 
as  they  present  advantages  outside  the  current  focus. 

In  the  past  decade,  CP2K  has  become  a  standard  tool  of  the  computational  chemistry  community 
for  efficient  QM  molecular  dynamics.  One  area  where  CP2K  has  been  a  leader  in  the  field  is  in 
linear  scaling  methods.  CP2K  mixes  plane  waves  and  Gaussian  descriptions  of  atoms  and  their 
interactions.  This  explicitly  lends  itself  well  to  linear  scaling  methods  based  on  sparse  matrix 
density  functional  methods.  One  approach  is  to  use  purification  of  the  Kohn-Sham  matrix.  This 
is  currently  implemented  in  CP2K,  though  there  have  been  recent  developments  in  this  area  that 
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could  provide  significant  advantages  over  the  current  state  of  the  art.  ’ 

Purification  is  analogous  to  preconditioning  in  linear  algebra,  but  the  transformation  is  intended 
to  resolve  the  Kohn-Sham  matrix  such  that  it  becomes  idempotent.  This  relies  on  predefined 
polynomials,  which  must  balance  the  order  of  the  polynomial  with  the  number  of  iterations 
needed  to  create  an  idempotent  matrix.  A  higher-order  polynomial  requires  more  matrix-matrix 
multiplications  but  may  allow  for  fewer  iterations.  It  has  been  shown  that  relatively  low-order 
polynomials  perform  well."^  CP2K  currently  employs  a  fourth-order  polynomial,  which 
monotonically  transforms  the  matrix  to  idempotency  and  is  known  as  the  fourth-order  trace 
resetting  (TRS4)  method.^ 
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A  recently  published  method  provides  a  framework  for  using  a  chosen  polynomial  but  with 
nonmonotonic  constraints  on  the  polynomial  that  allow  treatment  of  a  low  band  gap  system.  The 
nonmonotonic  approach  we  chose  to  implement  in  CP2K  was  based  on  the  second-order  trace 
conserving  (TC2)  method,^  which  allows  for  weighting  of  the  polynomials  based  on  the  highest 
occupied  molecular  orbital  (HOMO)  and  lowest  unoccupied  molecular  orbital  (LUMO)  energies. 
This  allows  the  polynomials  to  adapt  as  the  matrix  approaches  idempotentcy.  The  method 
implemented  is  a  second-order  polynomial  reducing  the  amount  of  memory  required  and  the 
matrix-matrix  multiplications.  This  allows  for  a  larger  number  of  atoms  to  be  treated  than 
currently  available  methods  in  CP2K.  The  nonmonotonic  nature  allows  the  purification  to 
perform  as  well  as  the  higher-order  methods  and  even  outperform  the  current  state  of  the  art  at 
low  band  gaps. 

The  nonmonotonic  TC2  method  therefore  allows  for  a  lower  memory  footprint  for  the 
calculation.  The  memory  savings  come  from  having  to  save  each  multiplication  intermediate. 
Generally,  for  a  given  order  of  polynomial  (N),  N-1  intermediates  must  be  saved.  The  benefit  of 
this  can  be  seen  in  the  largest  calculation  capable  of  being  run.  A  single  molecular  dynamics  step 
on  lOOK  atoms  can  be  treated  with  TC2  but  not  using  the  current  TRS4  method  implemented  in 
CP2K.  Another  advantage  of  nonmonotonic  TC2  is  the  ability  to  adapt  the  purification  to  the 
band  gap.  This  is  important  as  the  difference  between  the  HOMO  and  LUMO  orbitals  becomes 
very  small.  As  the  band  gap  becomes  small,  the  purification  must  differentiate  how  to  spread  out 
the  values  of  matrix,  which  becomes  difficult  for  an  arbitrarily  chosen  polynomial.  The 
nonmonotonic  approach  allows  the  flexibility  in  the  transformation  to  resolve  the  small 
eigenvalues. 

CP2K  is  a  freely  available  program,  combining  several  other  open  source  libraries  to  form  a 
complete  simulation  package.  Our  implementation  of  the  TC2  method  in  CP2K  is  most  easily 
understood  by  looking  at  the  following  pseudo  code.  A  full  subroutine  is  included  in  the 
appendix.  However,  it  uses  many  conventions  and  variables  specific  to  CP2K  and  is  therefore 
hard  to  understand  without  context. 

PSEUDO  CODE 

Initiate  from  self  consistent  field  convergence 

INPUTS:  Kohn  Sham  Matrix  (KS  matrix),  #  occupied  orbitals  (nelectron) , 
value  of  highest  occupied  molecular  orbital  (homo)  ,  value  of  lowest 
unoccupied  molecular  orbital  (lumo) 

Initialize:  X  =  (eps_n*I  -  H) / (eps_n  -  eps_0)  ...  H*  =  S^-l/2*H*S^-l/2 
Get  largest/smallest  eigenvalues  for  scaling 
Scale  KS  matrix 
Get  X*X 

Set  alpha,  beta  and  betaB  =  1 
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DO  I=l,Max_iter 

IF (do_non_monotonic)  THEN 

IF  (  trace_fx  >  nelectron  )  THEN 
alpha=2/ (2-beta) 

Xn+I  =  (aX+  (1-a) I) ^2 
beta  =(alpha*beta  +l-alpha) **2 
betaB= (alpha*betaB+l-alpha) **2 
ELSE 

alpha=2/ (I+betaB) 

Xn+I  =  2aX-a^2*X^2 

beta  =2 *alpha*beta  - (alpha*beta  ) **2 
betaB=2 *alpha*betaB- (alpha*betaB) **2 

ELSE 

IF  (trace^fx  >  nelectron)  THEN 
Xn+I  =  X*X 
ELSE 

Xn+I  =  2X-X*X 
ENDIF 
ENDIF 

Evaluate  error  associated  with  purification  step.  If  below 
threshold,  exit  loop 
ENDDO 

Compute  homo/lumo 

Export  X  to  an  updated  density 

Return  to  self  consistent  field  convergence 

In  this  report,  the  nonmonotonic  TC2  method  of  purification  is  compared  with  the  TRS4  method 
already  in  CP2K.  A  series  of  calculations  were  run  to  gauge  how  effective  the  TC2  is  for  low  and 
high  band  gap  systems.  The  TC2  method  lowers  the  memory  requirements  compared  with  TRS4, 
which  allows  larger  systems  to  be  simulated,  as  the  current  limiting  factor  on  a  commodity 
supercomputer  tends  to  be  the  amount  of  memory  per  core.  The  nonmonotonic  nature  of  the  TC2 
allows  for  faster  convergence  over  traditional  TC2  methods  and  is  competitive  with  the  higher 
order  polynomial  method  TRS4.  Further,  the  nonmonotonic  TC2  outperforms  TRS4  for  low 
band  gap  systems  where  the  purification  can  be  adjusted  based  on  the  HOMO-LUMO  gap  in 
nonmonotonic  TC2,  leading  to  quicker  idempotency  of  the  Kohn-Sham  matrix. 


2.  Results  and  Conclusions 


Figure  1  shows  a  comparison  of  TC2  and  TRS4  in  which  the  time  for  each  band  gap  is 
normalized  to  TRS4.  Two  systems  were  used  to  span  the  range  of  band  gaps  shown  with  real 
chemical  examples  rather  than  artificial  matrix  inputs.  These  are  shown  in  Fig.  2.  The  O.X  band 
gap  was  a  1024  water  case.  This  water  box  was  constructed  by  an  NVT  (number,  volume, 
temperature)  simulation  of  TIP4P  waters  at  300  K,  part  of  the  test  cases  for  CP2K.  Subsequent 
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calculations  were  performed  for  extensions  of  an  alpha  helix  of  alanine  moleeules.  This  was 
chosen  because  the  band  gap  is  lowered  as  the  alpha  helix  is  extended.  This  is  an  effect  of  the 
induced  dipoles  on  the  alanine  aligning  in  the  turns  of  the  alpha  helix.  Therefore,  by  adding  more 
repeating  units,  we  ean  simulate  a  particular  band  gap.  The  chart  represents  a  20,  40,  80  alanine 
repeat.  These  timings  were  run  on  a  CRAY  XE6  on  512  proeessors.  These  represent  the  average 
of  two  runs  for  each  system  and  may  be  influeneed  by  normal  variations  in  timings  due  to 
eommunieation  bottleneeks.  Another  test  probed  how  large  of  a  system  was  feasible.  A  test  of 
104K  atoms  was  constructed  by  creating  a  supereell  from  multiples  of  the  1024  water  box.  Both 
TRS4  and  TC2  were  run  on  2048  proeessors.  The  TRS4  method  failed  to  run  from  an  out-of- 
memory  error,  while  the  TC2  was  able  to  eomplete  an  energy  evaluation  with  a  double-zeta 
valence  polarized  basis  set.  This  stems  from  the  fact  that  the  TC2  method  is  able  to  use  two 
fewer  eopies  of  the  Kohn-Sham  matrix  to  perform  the  purifieation,  making  a  signifieant 
difference  at  high  atom  eounts.  It  may  also  explain  some  of  the  timing  differenees. 


Fig.  1  Comparison  of  TRS4  and  TC2  purification  scheme  times  in  CP2K.  Timings  are  normalized 
to  TRS4  for  each  band  gap. 
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Fig.  2  Graphical  representation  of  the  1024  water  box  (left)  and  a  40-subunit  repeat  of  alanine 
in  an  alpha  helix  (right) 

The  TC2  method  has  been  shown  to  provide  signifieant  advaneements  to  the  state  of  art  in 
CP2K.  There  are  some  issues  with  premature  identification  of  an  idempotent  matrix  for  the  water 
case  resulting  in  a  4-kcal/mol  error  for  any  size  of  water  cluster.  By  lowering  the  eps_threshold 
(a  numerical  threshold  for  when  to  drop  elements  from  the  sparse  matrix  representation)  slightly 
compared  with  TRS4,  the  TC2  method  reproduces  the  TRS4  result.  The  deviation  was  consistent 
over  an  eps_threshold  range  of  lE-5  to  lE-7.  This  deviation  of  accuracy  between  the  methods 
was  not  seen  on  diamond,  alpha  helixes,  or  graphene.  It  may  be  that  the  small  contributions  of 
the  hydrogen  bonding  network  are  not  being  preserved  as  well  as  in  the  nonmonotonic  approach. 
Current  error  estimation  is  designed  for  speed  over  accuracy,  and  this  may  not  be  an  acceptable 
trade-off  for  the  nonmonotonic  polynomials.  Euture  work  might  explore  possible  improvements 
such  a  using  a  Euclidean  norm  rather  than  a  Erobenius  norm  in  error  estimation  to  alleviate 
premature  idempotency  identification  for  the  nonmonotonic  method. 
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Appendix.  The  Subroutine  and  Function  Added  to  CP2K 


This  appendix  appears  in  its  original  form,  without  editorial  change. 
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Insignificant  changes  were  made  to  input  parsing  routines  to  allow  execution  of  these 
subroutines. 


***************************************************************************** 

>  \brief  compute  the  density  matrix  using  a  non  monotonic  trace  conserving  algorithm 

>  \par  History 

>  2013.08  created  [Jonathan  Mullin] 

>  \author  Jonathan  Mullin 

***************************************************************************** 
SUBROUTINE  density_matrix_tc2 (matrix_p,  matrix_ks,  matrix_s_sqrt_inv,  & 

nelectron,  threshold,  e_homo,  e_lumo,  e_mu,  & 
dynamic_threshold, non_monotonic, & 
matrix_ks_deviation, max_iter_lanczos , & 
eps_lanczos,  error) 

TYPE (cp_dbcsr_type) ,  INTENT (INOUT) 

TYPE (cp_dbcsr_type) ,  INTENT (IN) 

INTEGER,  INTENT (IN) 

REAL (KIND=dp) ,  INTENT (IN) 

REAL (KIND=dp) ,  INTENT ( INOUT ) 

LOGICAL,  INTENT (IN),  OPTIONAL 
LOGICAL,  INTENT (IN),  OPTIONAL 
TYPE (cp_dbcsr_type) ,  INTENT (INOUT) 

INTEGER,  INTENT (IN) 

REAL (KIND=dp) ,  INTENT (IN) 

TYPE (cp_error_type) ,  INTENT (inout) 


:  matrix_p 

:  matrix_ks,  matrix_s_sqrt_inv 
:  nelectron 
:  threshold 

:  e_homo,  e_lumo,  e_mu 
:  dynamic_threshold 
:  non_monotonic 
:  matrix_ks_deviation 
:  max_iter_lanczos 
:  eps_lanczos 
:  error 


CHARACTER (LEN=* ) ,  PARAMETER  ::  routineN  =  ' density_matrix_tc2 ' ,  & 

routineP  =  moduleN// //routineN 
INTEGER,  PARAMETER  : :  max_iter  =  100 


INTEGER  : :  estimated_steps ,  nne,  & 

handle,  i,  j,  unit_nr 

INTEGER ( kind=int_8 )  ::  flopl,  flop2 

LOGICAL  : :  converged,  do_dyn_threshold,  & 

do_non_monotonic 

REAL (KIND=dp)  ::  current_threshold,  eps_max,  eps_min,  est_threshold,  & 
frob_id,  frob_x,  homo,  lumo,  max_eig,  max_threshold,  maxdev,  & 
maxev,  min_eig,  minev,  mmin,  mu,  mu_a,  mu_b,  mu_c,  mu_fa,  mu_fc,  & 
occ_matrix,  scaled_homo_bound,  scaled_lumo_bound,  tl,  t2,  trace_fx,  & 
xi,  alpha,  beta,  betaB 

TYPE (cp_dbcsr_type)  ::  matrix_k0,  matrix_x,  & 

matrix_xidsq,  matrix_xsq 

TYPE (cp_logger_type) ,  POINTER  ::  logger 

CALL  timeset (routineN, handle) 


logger  =>  cp_error_get_logger (error) 

IF  (logger%para_env%mepos==logger%para_env%source)  THEN 

unit_nr=cp_logger_get_def ault_unit_nr (logger, local= . TRUE . ) 
ELSE 

unit_nr=-l 

ENDIF 


do_dyn_threshold  =  .FALSE. 
do_non_monotonic  =  .TRUE. 

IF (PRESENT (dynamic_threshold) )  do_dyn_threshold  =  dynamic_threshold 
IF (PRESENT (non_monotonic) )  do_non_monotonic  =  non_monotonic 

!  init  X  =  (eps_n*I  -  H)  /  (eps_n  -  eps_0)  ...  H*  =  S'^-l/2*H*S''-l/2 
CALL  cp_dbcsr_init (matrix_x, error=error) 

CALL  cp_dbcsr_create (matrix_x,  template=matrix_ks,  matrix_type=dbcsr_type_no_symmetry, 
error=error ) 

CALL  cp_dbcsr_multiply ( "N" ,  "N",  1.0_dp,  matrix_s_sqrt_inv,  matrix_ks,& 

0.0_dp,  matrix_x,  f ilter_eps=threshold, error=error ) 

CALL  cp_dbcsr_multiply ( "N" ,  "N",  1.0_dp,  matrix_x,  matrix_s_sqrt_inv,  & 

0.0_dp,  matrix_x,  f ilter_eps=threshold, error=error ) 
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CALL  cp_dbcsr_init (matrix_kO, error=error) 

CALL 

cp_dbcsr_create (matrix_kO , template=matrix_ks , matrix_type=dbcsr_type_no_symmetry, error=error ) 
CALL  cp_dbcsr_copy (matrix_kO ,  matrix_x,  error=error) 


!  compute  the  deviation  in  the  mixed  matrix,  as  seen  in  the  ortho  basis 

CALL  cp_dbcsr_add (matrix_ks_deviation,  matrix_x  ,  -1.0_dp,  1.0_dp,  error=error) 

CALL  lanczos_alg_serial (matrix_ks_deviation,  maxev,  minev,  max_iter=max_iter_lanczos, 
threshold=eps_lanczos ,  & 


(e 


converged=converged,  error=error ) 
maxdev  =  MAX (ABS (maxev) ,  ABS (minev) ) 


IF 


(unit_nr>0)  THEN 
WRITE (unit_nr 
WRITE (unit_nr 
WRITE (unit_nr 
WRITE (unit_nr 
WRITE (unit  nr 


'  (T6,A, 1X,L12)  ' ) 

’  (T6,A,1X,F12.5)  ’) 
’  (T6,A,1X,F12.5)  ’) 
’  (T6,A,1X,F12.5)  ’) 
’  (T6,A, 1X,L12)  '  ) 


homo+maxdev) ) >0 


"Lanczos  converged: 
"change  in  mixed  matrix: 
"HOMO  upper  bound: 

"LUMO  lower  bound: 
"Predicts  a  gap  ? 


converged 

maxdev 

e_homo+maxdev 
e_lumo-maxdev 
( (e_lumo-maxdev) - 


ENDIF 


!  save  the  old  mixed  matrix 

CALL  cp_dbcsr_copy (matrix_ks_deviation,  matrix_x  ,  error=error) 


!  get  largest/smallest  eigenvalues  for  scaling 

CALL  lanczos_alg_serial (matrix_x,  max_eig,  min_eig,  max_iter=max_iter_lanczos, 
threshold=eps_lanczos , & 


converged=converged,  error=error ) 

IF  (unit_nr>0)  WRITE (unit_nr , ' (T6 , A, IX, 2F12 . 5 , IX, A, IX, Ll ) ' )  "Est.  extremal  eigenvalues", 
&  min_eig,  max_eig, "  converged:  ",  converged 


eps_max  =  max_eig 
eps_min  =  min_eig 

!  scale  KS  matrix 

CALL  cp_dbcsr_scale (matrix_x,  -1.0_dp,  error=error) 

CALL  cp_dbcsr_add_on_diag (matrix_x,  eps_max,  error=error) 

CALL  cp_dbcsr_scale (matrix_x,  1/ (eps_max-eps_min) ,  error=error) 

!  scale  bounds  for  HOMO/LUMO 

scaled_homo_bound  =  (eps_max- (e_homo+maxdev) ) / (eps_max-eps_min) 
scaled_lumo_bound  =  (eps_max- (e_lumo-maxdev) ) / (eps_max-eps_min) 
current  threshold  =  threshold 


CALL  cp_dbcsr_init (matrix_xsq, error=error) 

CALL 

cp_dbcsr_create (matrix_xsq, template=matrix_ks , matrix_type=dbcsr_type_no_symmetry, error=error ) 

cp_dbcsr_create (matrix_xidsq, template=matrix_ks , matrix_type=dbcsr_type_no_symmetry, error=erro 
r)  “  “ 


beta=e_lumo 
betaB=e_homo 
alpha=l . 0_dp 

DO  i=l,max_iter 
tl  =  m_walltime() 
flopl  =  0;  flop2  =  0 

!  get  X*X 

CALL  cp_dbcsr_multiply ( "N" ,  "N",  1.0_dp,  matrix_x,  matrix_x,& 

0.0_dp,  matrix_xsq,  & 

f ilter_eps=current_threshold, f lop=f lopl , error=error ) 

!  intermediate  use  matrix_p  to  compute  =  X*X-X 
CALL  cp_dbcsr_copy (matrix_p,  matrix_x, error=error ) 

CALL  cp_dbcsr_add (matrix_p,  matrix_xsq,  -1.0_dp,  1.0_dp,  error=error) 
frob_id  =  cp_dbcsr_f robenius_norm (matrix_p) 
frob_x  =  cp_dbcsr_f robenius_norm (matrix_x) 
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CALL  cp_dbcsr_trace (matrix_x,  trace_fx,  error=error) 

!  quantities  used  for  dynamic  thresholding,  when  the  estimated  gap  is  larger  than  zero 
xi  =  (scaled_homo_bound-scaled_lumo_bound) 

IF  (do_dyn_threshold  .AND.  xi  >  0.0_dp)  THEN 

mmin  =  0 . 5* (scaled_homo_bound+scaled_lumo_bound) 
max_threshold  =  ABS (l-2*mmin) *xi 

scaled_homo_bound  =  evaluate_tc2_polynomial (scaled_homo_bound,  trace_fx,  nelectron) 
scaled_lumo_bound  =  evaluate_tc2_polynomial (scaled_lumo_bound,  trace_fx,  nelectron) 
estimated_steps  =  estimate_steps (scaled_homo_bound,  scaled_lumo_bound,  threshold) 

est_threshold  = 

(threshold/ (estimated_steps+i+l ) ) *xi/ ( 1+threshold/ (estimated_steps+i+l ) ) 
est_threshold  =  MIN (max_threshold,  est_threshold) 

IF  (i  >  1)  est_threshold  =  MAX (est_threshold,  0 . l_dp  *  current_threshold) 
current_threshold  =  est_threshold 
ELSE 

current_threshold  =  threshold 
ENDIF 


IF (do_non_monotonic)  THEN 
IF  (  trace_fx  >  nelectron  )  THEN 
!  Xn+1  =  (aX+  (1-a)  I)  "^2 
alpha=2/ (2-beta) 

CALL  cp_dbcsr_scale (matrix_xsq,  alpha**2,  error=error) 

CALL  cp_dbcsr_add_on_diag (matrix_xsq,  1 . 0-alpha**2 ,  error=error) 
CALL  cp_dbcsr_copy (matrix_x,  matrix_xsq,  error=error) 

CALL  cp_dbcsr_f liter (matrix_x,  current_threshold,  error=error) 

beta  =(alpha*beta  +l-alpha)**2 
betaB= (alpha*betaB+l-alpha) **2 


ELSE 

!  Xn+1  =  2aX-a'^2*X'^2 

alpha=2/ (1+betaB) 

CALL  cp_dbcsr_add (matrix_x,  matrix_xsq,  2 . 0_dp*alpha,  -1 . 0_dp*alpha**2 ,  error=error) 
CALL  cp_dbcsr_f liter (matrix_x,  current_threshold,  error=error) 

beta  =2*alpha*beta  -  (alpha*beta  )**2 
betaB=2*alpha*betaB- (alpha*betaB) **2 

ENDIF 


ELSE 

IF  (trace_fx  >  nelectron)  THEN 
!  Xn+1  =  X*X 

CALL  cp_dbcsr_copy (matrix_x,  matrix_xsq,  error=error) 

ELSE 

!  Xn+1  =  2X-X*X 

CALL  cp_dbcsr_add (matrix_x,  matrix_xsq,  2.0_dp,  -1.0_dp,  error=error) 
CALL  cp_dbcsr_f liter (matrix_x,  current_threshold,  error=error) 

ENDIF 

ENDIF 

occ_matrix  =  cp_dbcsr_get_occupation (matrix_x) 
t2  =  m_walltime() 

IF  (unit_nr>0)  THEN 
WRITE (unit_nr, & 

’ (T6,A, 13, IX, FIO. 8,E12 .3, F12 .3, F13.3,E12 .3) ' )  "TC2  it  ",  & 
i,  occ_matrix,  t2-tl,& 

(flopl+flop2) / (1. 0E6_dp* (t2-tl ) ) ,  current_threshold 
CALL  m_f lush (unit_nr ) 

ENDIF 
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IF  (unit_nr>0)  WRITE (unit_nr, ' (T6, A, 5E12 . 3) ' ) 

"Err" , f rob_id, f rob_x, threshold, ABS ( f rob_id/ f rob_x) , SQRT (threshold) 

IF  (ABS (f rob_id/f rob_x)  <  SQRT (threshold)  )  EXIT 

END  DO 

occ_matrix  =  cp_dbcsr_get_occupation (matrix_x) 

IF  (unit_nr>0)  WRITE (unit_nr,  ' (T6 , A, 13 , IX, FlO . 8 ) ' )  'Final  TC2  iteration  ',  i, 
occ  matrix 


!  free  some  memory 

CALL  cp_dbcsr_release (matrix_xsq,  error=error) 

!  CALL  cp_dbcsr_release (matrix_xidsq,  error=error) 

!  output  to  matrix_p,  P  =  inv(S)'^0.5  X  inv(S)'^0.5 

CALL  cp_dbcsr_multiply ( "N" ,  "N",  1.0_dp,  matrix_x,  matrix_s_sqrt_inv, & 

0.0_dp,  matrix_p,  f ilter_eps=threshold, error=error ) 

CALL  cp_dbcsr_multiply ( "N" ,  "N",  1.0_dp,  matrix_s_sqrt_inv,  matrix_p,& 

0.0_dp,  matrix_p,  f ilter_eps=threshold, error=error ) 

compute_homo_lumo (matrix_kO , matrix_x, eps_min, eps_max, threshold, max_iter_lanczos , eps_lanczos , h 
omo, lumo, unit_nr , error) 

e_homo  =  homo 
e_lumo  =  lumo 

beta=lumo 

betaB=homo 


CALL  cp_dbcsr_release (matrix_x,  error=error) 
CALL  cp_dbcsr_release (matrix_kO ,  error=error) 
CALL  timestop (handle) 

END  SUBROUTINE  density_matrix_tc2 


FUNCTION  evaluate_tc2_polynomial (x 
REAL (KIND=dp) 

INTEGER 
REAL (KIND=dp) 

INTEGER 


trace_fx, nelectron)  RESULT  (xr) 
: :  X, trace_fx 
: :  i , nelectron 

:  :  xr 


i=l 

xr  =  X 
DO  k=l,i 

IF  (trace_fx  <  nelectron)  THEN 
xr  =  2*xr-xr**2 
ELSE 

xr  =  xr**2 
ENDIF 
ENDDO 

END  FUNCTION  evaluate_tc2_polynomial 
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